Estimating the real burden of gestational syphilis in Brazil, 2007–2018: a Bayesian modeling study

Summary Background Although several studies have estimated gestational syphilis (GS) incidence in several countries, underreporting correction is rarely considered. This study aimed to estimate the level of under-registration and correct the GS incidence rates in the 557 Brazilian microregions. Methods Brazilian GS notifications between 2007 and 2018 were obtained from the SINAN-Syphilis system. A cluster analysis was performed to group microregions according to the quality of GS notification. A Bayesian hierarchical Poisson regression model was applied to estimate the reporting probabilities among the clusters and to correct the associated incidence rates. Findings We estimate that 45,196 (90%-HPD: 13,299; 79,310) GS cases were underreported in Brazil from 2007 to 2018, representing a coverage of 87.12% (90%-HPD: 79.40%; 95.83%) of registered cases, where HPD stands for the Bayesian highest posterior density credible interval. Underreporting levels differ across the country, with microregions in North and Northeast regions presenting the highest percentage of missed cases. After underreporting correction, Brazil’s estimated GS incidence rate increased from 8.74 to 10.02 per 1000 live births in the same period. Interpretation Our findings highlight disparities in the registration level and incidence rate of GS in Brazil, reflecting regional heterogeneity in the quality of syphilis surveillance, access to prenatal care, and childbirth assistance services. This study provides robust evidence to enhance national surveillance systems, guide specific policies for GS detection disease control, and potentially mitigate the harmful consequences of mother-to-child transmission. The methodology might be applied in other regions to correct disease underreporting. Funding 10.13039/501100003593National Council for Scientific and Technological Development; The Bill Melinda Gates Foundation and 10.13039/100010269Wellcome Trust.


Introduction
Syphilis is a sexually transmitted infection caused by Treponema pallidum, first identified in 1905. 1 Despite that, it is still a persistent public health issue in many countries worldwide since the rate of syphilis has increased dramatically in recent decades, 1 becoming both a burden and damage to public health, 2-4 particularly, during pregnancy. Syphilis diagnosed during pregnancy (also called Gestational Syphilis (GS) and hereafter referred to as such) is the second leading cause of stillbirth globally and results in severe adverse outcomes in more than 50% of cases, such as prematurity, low birth weight, miscarriage, stillbirth, and neonatal death. 1 Data from 2016 shows that over 900,000 pregnant women were diagnosed with syphilis worldwide, resulting in more than 350,000 adverse outcomes, from which over 200,000 were stillbirths or neonatal deaths. 5,6 The region of the Americas had the second-highest estimated prevalence of GS worldwide (0.86%), after Africa (1.52%). 6 In 2016, the Brazilian Ministry of Health declared gestational and congenital syphilis as an epidemic in the country due to the exponential increase in their incidence rates, with unfavorable outcomes for both women and newborns. 7 Despite the mandatory registration of gestational syphilis and the notable increase in notifications in recent years, underreporting GS cases is still a problem in Brazil. [8][9][10][11][12] Failures on report GS could bias the actual scenario of the disease and compromise the knowledge of the real burden of GS in the country. Many factors can influence the underreporting (under-registration) of GS cases in Brazil, mainly the inequality of access to health services and prenatal care, failures in syphilis testing during the antenatal period, and failures to register the notification in the Health Information Systems (HIS). [13][14][15] Accurate reporting of GS cases allows policymakers to design control policies and helps to reduce its consequences for pregnant individuals and their offspring. Although Brazil and other countries recognize the importance of improving the quality of syphilis detection and notification during pregnancy, 16 epidemiological studies adopting robust methodologies to analyze

Research in context
Evidence before this study The World Health Organization (WHO) estimates that syphilis infection affects one million pregnancies yearly worldwide, with important consequences for pregnant women and newborns. Untreated gestational syphilis (GS) has been linked to a significantly higher incidence of stillbirth, neonatal death, prematurity, and low birth weight. Notifying GS is compulsory for all WHO member countries; it is mandatory to communicate all cases to health authorities and investigate them. However, the lack of quality in the case detection and data recording processes may contribute to under-registration of GS cases on the health information systems, which underestimates the real magnitude of the disease, especially in middle-income countries such as Brazil. We searched PubMed/Medline, EMBASE, and LILACS using the search strategy ("underreporting of syphilis" OR "syphilis underreporting" OR "under-registration of syphilis" OR "under-registration") AND ("Pregnant women") AND (Brazil) seeking for studies on under-registration of GS across Brazil until July 2023. We supplemented the literature review with internet searches (Google scholar and Semantic scholar), analysis of references in identified papers, and the authors' own knowledge. We found 6 studies focusing on small to medium-sized Brazilian cities, which indicated: 13.74% of GS under-registration between 2010 and 2013 in the west of São Paulo State; 6.5% of GS notification in Montes Claros city, Minas Gerais State, between 2007 and 2013; 50% more cases of congenital syphilis than GS cases from 2009 to 2015 in Londrina, Paraná State; and 46% of GS underreporting in a special indigenous sanitary district in Mato Grosso State. None of these studies covered the whole country and they do not considered factors associated with underreported cases of GS or even provided corrected estimation of GS incidence. Based on such literature review, the modeling of syphilis under-registration is also rare worldwide since none study was found on this regard.

Added value of this study
This is the first study to map the quality of GS notifications at the national, regional, and small area levels in a large country such as Brazil, emphasizing that GS detection and incidence are related to socio-economic conditions and access to health services such as deprivation level and adequate prenatal care. States and microregions in the Brazilian North and Northeast regions presented the highest GS underreporting cases. Health managers and policymakers could easily use our methodology to amplify national syphilis surveillance and estimate the real incidence of GS. Also, the statistical modeling strategy may be used as a reference for application in other middle-income countries.
Implications of all the available evidence Taken together, modeling results from this national-level study on GS burden in Brazil, along with findings from other local surveys, suggest that GS remains a significant public health problem, highlighting the need for improved surveillance. Our research reveals important differences in underreporting of GS cases across Brazilian regions, as well as the association of syphilis surveillance quality with socioeconomic conditions and access to healthcare. This underscores the urgent need to enhance syphilis surveillance and case notification within the healthcare system, particularly in the country's poorest region. Furthermore, correcting GS incidence at different area levels could provide a better understanding of the epidemiological profile and help estimate the social, economic, and health burdens associated with the disease nationwide. Additionally, our findings offer valuable insights for policymakers to enhance planning and take action to reduce the burden of GS, thereby contributing to the reduction of mother-to-child transmission and adverse effects during childbirth.
Articles underreporting of GS are rare. Few studies have addressed this issue in Brazil and most of the available literature presents analyses with small sample sizes and focuses on specific regions. [9][10][11] Despite those efforts, there are still gaps in the investigation of underreporting of GS across the country and correction of its incidence. This study aims to estimate the real burden of GS in Brazilian microregions from 2007 to 2018 by applying a statistical modeling strategy to correct underregistration of cases in an official data reporting system, which may be used as a reference for application in other middle-income countries. Measures of underreporting levels are provided by comparing the model estimates with the reported data.

Study design and target population
We conducted a cross-sectional ecological study design, considering GS cases notified in Brazil between 2007 and 2018. Brazil, a continental country, divided into 27 states and five macroregions (North, Northeast, South, Southeast and Midwest), comprising 557 microregions, which are defined as contiguous aggregations of the 5570 Brazilian municipalities. 17

Data sources and variables description
This study used administrative non-identified data available from public Brazilian databases. The outcome variable, the count of GS cases, was extracted from the Notifiable Diseases Information System (SINAN in Portuguese: Sistema de Informação de Agravos de Notificação) using data received from the Brazilian Ministry of Health in 2019. In Brazil, it is mandatory for healthcare professionals to register GS cases through SINAN-Syphilis, where they complete a form including personal, socioeconomic, and gestational information of the patient at the time of diagnosis. 14 GS cases registered in SINAN-Syphilis followed the definition described by the Brazilian Ministry of Health 14,18 considering active detection. This includes cases where a pregnant individual tests positive for the venereal disease research laboratory (VDRL) test with any titration during pregnancy, provided they have not received prior treatment. Additionally, cases may be considered if a confirmatory test (treponemal serology) is also positive. From the total 305,891 GS cases notified in Brazil between 2007 and 2018, we excluded: i) 131 cases for which the municipality of residence was not identified or it did not match the official municipality codes of the Brazilian Institute of Geography and Statistics (IBGE) 17 ; and ii) 2 cases registered in Fernando de Noronha, a Brazilian archipelago located more than 500 km away from the Brazilian mainland, which could prevent a spatial analysis with the method considered in our study. Therefore, our analysis was based on 305,758 (99.96%) GS cases notified in SINAN-Syphilis from 2007 to 2018.
For statistical analyses of GS incidence rates and underreporting level in Brazilian microregions, we collected socio-demographic information and access to health service indicators, as well as factors associated to quality of SINAN-Syphilis form registration. The GS incidence rates were calculated per 1000 live births in the microregion. All variables considered in our study are detailed in Table 1.

Statistical modeling
In the statistical literature, correction of underreporting in count data has been done based on censored 21 and compound [22][23][24][25][26] Poisson approaches. In both cases, due to identifiability issues, the model depends on the availability of extra information to support the data reporting process modeling. Thus, statistical inference is performed under the Bayesian framework, in which the extra information is accommodated through appropriate prior distributions. In this work, we applied the Bayesian hierarchical Poisson model proposed by Oliveira et al. (2022). 24 Provided that the Brazilian microregions are clustered according to the quality of GS data, this model only requires specification of an informative prior distribution for the reporting level in areas belonging to the best data quality cluster. This requirement is more feasible in our case, as we do not have: (i) validation datasets 25 ; (ii) information about global GS reporting level in Brazil 23 ; (iii) reliable local active search survey 26 ; nor (iv) informative prior for the reporting probability in all groups of areas. 22

Clustering analysis
We applied the K-means method to group the microregions according to their quality of GS data. We considered seven indicators as proxies for GS data quality: coverage of prenatal care; adequacy of prenatal care; test acceptability; diagnosis opportunity; consistency; adequacy of case classification; and completeness score of GS care assistance indicators (see Table 1). Variables were standardized before applying the Kmeans method in software R, 27 using packages 28 and factoextra. 29 Definition of the possible optimal number K of clusters was based on the elbow method, which relies on evaluating the total within-cluster sums of squares. 30 A sensitivity analysis was performed considering different values for the number K of clusters around the optimal region identified with the elbow method. In each case, the clusters were hierarchically related from the best to the worst with respect to data quality. To accomplish this, we analyzed the withincluster centroid, which contained the mean value for the seven variables used in the clustering analysis. The higher the centroid mean, the better the data quality.  24 proposed assuming that

Model specification
where parameters, μ i >0 and 0 < ε i < 1 denote, respectively, the GS incidence rate and the proportion of the true (unobserved) cases reported in the i-th microregion. We modeled the GS incidence rate μ i through the following regression structure where P i = total live births (offset), X 1i = Brazilian deprivation index, X 2i = proportion of live births with adequate prenatal care, X 3i = proportion of non-white live births (see descriptions in Table 1), and log represents the natural logarithm function. Those three covariates were selected on the basis of literature review, in which it is known that socio-demographic and adequacy to health care indicators tends to be related to the disease incidence. 16,[31][32][33] They were available in our data sources and then were used to account for discrepancies on socio-demographic and adequacy to health care among the Brazilian microregions. Variables X 1 , X 2 and X 3 were standardized before applying the model to estimate their respective effects (parameters) β 0 , β 1 , β 2 and β 3 . Terms δ i and ϕ i are, respectively, random effects accounting for residual overdispersion induced by local and spatial variations. As it is usual in spatial statistics modeling, the prior distribution for the spatially structured random effect ϕ i is represented by an intrinsic conditional auto-regressive (ICAR) model and δ i follows a non-informative Gaussian prior. 34 Following Oliveira et al. (2022), 24 we assume that ε i = 1−γ 1 for all areas belonging to the best data quality group, ε i = 1−γ 1 −γ 2 for all areas belonging to the second best data quality group, and so on, implying that Adequacy of prenatal care The proportion of live births reported in Brazilian microregions between 2007 and 2018 whose mothers attended seven or more prenatal appointments.
Non-white live births The proportion of non-white live births registered in Brazilian microregions between 2007 and 2018.

Proxies for quality of GS cases registration
Information System on Live Births (in Portuguese: Sistema de Informações sobre Nascidos Vivos-SINASC) Coverage of prenatal care The proportion of live births reported in Brazilian microregions between 2007 and 2018 whose mother attended at least one prenatal appointment in the same study period.

SINAN-Syphilis Test acceptability
The proportion of GS notified cases with a qualitative result (reactive or non-reactive) for at least one non-treponemal or treponemal test in each Brazilian microregion from 2007 to 2018.

Diagnosis opportunity
The proportion of GS notified cases in which the diagnosis was laboratory confirmed during the first or second trimester of pregnancy in each Brazilian microregion from 2007 to 2018.

Consistency
The proportion of municipalities within the microregion for which the number of registered GS cases is greater or equal to congenital syphilis cases registered from 2007 to 2018.
Adequacy of case classification  Articles quality cluster. Parameter γ 1 represent the proportion of underreported cases in areas classified in the highest level of data quality; γ 2 is the increment on such proportion for areas experiencing the second highest data quality level, and so on. Such parameters are hierarchically correlated through an appropriate conditional Uniform prior distribution. The most important and attractive feature of this modeling strategy is that, to attain identifiability, it only requires an informative prior distribution for parameter γ 1 , which represents the proportion of underreporting in those areas belonging to the best data quality cluster. An informative prior means a distribution that is highly concentrated in a specific subset of the parametric space. In our application to Brazilian GS data, with basis on literature review and information collected from experts' on the study of syphilis burden in Brazil, we performed a sensitivity analysis considering four different prior distributions for γ 1 ∼Uniform(0, 0.02), γ 1 ∼Uniform(0, 0.05), γ 1 ∼Uniform(0.05, 0.10) and γ 1 ∼Uniform(0.05, 0.10). Respectively, these priors imply that areas classified in the highest level of data quality report, on average, 99.0%, 97.5%, 95.0% and 92.5% of the GS cases. All these priors impose that less than 10.0% of the true GS cases are missed in such areas. Non-informative priors are elicited for parameters γ 2 , …, γ K .
By varying the number of clusters K and the prior distribution for parameter γ 1 , we fitted 12 different models. The WAIC 35 and the LPML 36 goodness-of-fit metrics were considered for model selection. For the WAIC, the smaller the value, the better the model fitted the data. Larger values of LPML indicate better fit. A more detailed discussion is presented in Section 1 of the Supplementary File, which includes the prior specification for all parameters of our Bayesian model as well as particularities of model implementation and validation.
It worth mentioning that, for each microregion, provided that μ i and ε i are estimated, the underreported GS cases, denoted here by Z i , can be estimated from the predictive posterior distribution Z i |μ i , ε i ∼Poisson(μ i (1 − ε i )). Then, the total GS cases can be estimated/predicted as T i = Y i + Z i , in such a way that the ratio between Y i and T i provides a coverage for the SINAN-Syphilis. The Bayesian highest posterior density credible interval with level 90% was used to evaluate significance of covariates X 1 , X 2 and X 3 and to quantify uncertainty in the model predictions.

Ethics approval
Not applicable. All analyses used de-identified and publicly available data.

Role of the funding source
The study's funders had no role in the study design, data collection, data analysis, interpretation, or writing of the manuscript.

Results
From January 2007 to December 2018, 305,758 GS cases were registered in Brazil corresponding to a GS incidence rate of 8.74 per 1000 live births. GS incidence rates vary across the Brazilian territory (Fig. 1), where the highest observed incidence rate at microregion and state levels were, respectively, 23.45 and 18.12, while the lowest ones were around 1.01 and 4.07, respectively. Northeast and North regions generally presented low incidence rates, as well as most microregions within Minas Gerais State (in Southeast region). Large clusters of microregions with high GS incidence rates were observed in states of Rio Grande do Sul (South region) and Mato Grosso do Sul (Midwest region), which is the only with an incidence rate above 10.2.
To correct for GS underreporting, we performed a sensitivity analysis by applying the Bayesian model with 12 different specifications, which are related to the combinations between the number of clusters (K) and the prior distribution elicited to parameter γ 1 (Supplementary Table S1 of the Supplementary File). The elbow method indicated that K = 9, K = 10 or K = 11 would be a good choice for the number of clusters (Fig. 2). According to the model evaluation metrics (Supplementary Table S1 of the Supplementary File), K = 9 was the best choice to aggregate the 557 Brazilian microregions according to the seven selected proxies of GS data quality. Data quality decreases from Cluster 1 (best data quality group) to Cluster 9 (worst data quality group) (Fig. 2). All the 17 microregions within Cluster 9 are located in the North (14 areas) and Northeast (3 areas) regions.
The sensitivity analysis indicated that, a priori, the proportion of underreported GS cases in the best data quality cluster is 1%, on average (Supplementary  Table S1 of the Supplementary File). Fig. 3 displays the estimated posterior mean for the proportion of the true (unobserved) GS cases that is reported in each Brazilian microregion and state (obtained by averaging the posterior mean of microregions within each state). For microregions belonging to the best data quality group (Cluster 1), the posterior percentage of reported GS cases is 99.01%, on average, which is quite influenced to the mean of the informative prior specified to parameter γ 1 as expected. For areas in Clusters 2, on average, we estimate that more than 90% of the GS cases were correctly registered in SINAN-Syphilis. For Cluster 3 to 6, on average, the proportion of GS cases registered was estimated between 90% and 80%, whereas it was less than 80% for areas belonging to Clusters 7 to 9. In the two worst data quality clusters, our Bayesian model estimated that 34 Paulo and Espírito Santo, in Southeast region) reported more than 92.13% of GS cases.
After correcting for underreporting, the GS incidence rate in Brazil was estimated as 10.02 per 1000 live births in the period. We observed a substantial increase in the GS incidence rate for microregions belonging to the worst data quality clusters, which are located, in general, in the Northeast and North regions (Fig. 4). The highest estimated incidence rates at microregion and state levels were, respectively, 26.15 and 20.12, while the lowest ones were around 1.19 and 4.95, respectively. Six Brazilian states presented a posterior mean greater than 10.2 for the GS incidence rate per 1000 live births. Analysis of the effects of covariates considered for modeling the GS incidence rates is provided in Supplementary

Discussion
Our study estimates that around 13% of the notifications of gestational syphilis were not registered in SINAN-Syphilis between 2007 and 2018 in Brazil. The underreporting pattern differs between regions, states   and microregions: a higher percentage of underreporting of GS cases was found in the North and Northeast regions, while, in general, in the Southeast region, the observed and corrected cases of GS were similar, considering that the lowest percentage of underreported GS cases were found in the microregions of such region. The lowest estimated coverage of GS cases was found in Roraima State (in North region), while the highest was found in São Paulo (in Southeast region). These different patterns highlight that reporting quality may be related to socio-economic and access to health care services.
To the best of our knowledge, this is the first study mapping the underreporting of GS across the country and estimating the real burden of the disease in all Brazilian microregions and states. A small-scale surveillance study in Minas Gerais State found that only 6.5% of identified GS cases were reported in SINAN-Syphilis between 2007 and 2013. 10 In Londrina, Paraná State, a study reported that congenital syphilis cases were 50% higher than GS cases from 2009 to 2015, 11 while Tiago et al. Comparison of our results with previous studies is not possible considering their heterogeneity in scale, size, and regional focus, as well as methodological differences in the approaches.
We highlight the role of inequalities in access to health services and GS cases detection. Despite the expansion of prenatal coverage in Brazil in the last decade and improvements in actions to reduce GS, the quality of prenatal services is still low in many Brazilian  Articles cities, particularly in the North and Northeast regions. 13 Previous studies also pointed out that the quality of syphilis surveillance in Brazil is intrinsically related to the adequacy of prenatal care, 31 since it is recommended that the serological test for T. pallidum among pregnant women should be requested, at least, at the first prenatal care visit, in the beginning of the third trimester of gestational period, and at the moment of childbirth. 14 Therefore, inequality in healthcare access contributes directly to underreporting of syphilis cases in the country, particularly among women living in vulnerable contexts, in rural and remote areas, or those facing cultural barriers in access to healthcare services. 13,15,16,31,37 Also, the inability of some health professionals to identify syphilis symptoms in the early stage during prenatal care have been associated with underreporting of the disease. Besides, the access to syphilis tests during the gestational period and the reporting of cases have been directly associated with highest municipal Human Development Index (HDI), most of them located in the Southeast and South region of Brazil. 13,31,32,37 The availability of syphilis rapid test in healthcare service, the timely access to results of diagnostic tests, in addition to postponing the search for test results due to the absence of symptoms in some stages of the infection contribute to GS underreporting. Therefore, the highest quality of prenatal care services is closely related to access to syphilis tests, and could contribute to reduce failures in reporting GS. 13,31,32,37 Aligned with our results, we found that the highest incidence rates of underreported cases of GS occurred in North and Northeast, regions that concentrated the microregions with the highest deprivation levels, and with less coverage and access to diagnostic test and prenatal care, factors previously associated with underreporting and the GS incidence rate in Brazil. 31,32,37 The increasing incidence of GS represents a growing concern for health professionals and public health in Brazil, 4,14 and failure to correctly diagnose syphilis during pregnancy contributes to non-treatment, since the disease could be asymptomatic, or lead to inadequate adherence to treatment, particularly when the diagnosis occurs at the end of the gestational period. Additionally, stigma and fear of discrimination may discourage pregnant individuals from seeking care for treatment, further contributing to the inadequate management of syphilis during pregnancy, which increases the occurrence of adverse health outcomes for pregnant individuals. Also, failures in the diagnosis of GS could increase the vertical transmission of syphilis, as well as it enhances the chance of adverse effects on the newborn. 6,38 Moreover, it is important to highlight that syphilis should be considered during pregnancy as a fetal emergency due to the high proportion of early or late fetal deaths related to lack of treatment. 14,39 Thus, underreporting of GS can occult the real epidemiological profile of the disease in the country and reduce the possibility of achieving the goals of the World Health Organization regarding GS and congenital syphilis. 40 Besides, it is important highlighting that the variation of GS reporting coverage identified across Brazilian microregions and states expresses differences in capabilities and human resources of health systems, particularly related to prenatal and childbirth care across the country. This scenario creates the urgency of actions and programs for each region to improve GS diagnosis and registration of cases in the SINAN-Syphilis.

Strengths and limitations
The study is a first attempt to estimate the underreporting of GS in Brazil and correct incidence rates for the whole country, which could help to address the country's gap of knowledge regarding GS notification. We provide robust subsidies to Brazilian health authorities to better increase surveillance for GS across the country, highlighting municipalities where improvements in surveillance and case reporting should be prioritized. Our results could allow policymakers and stakeholders to improve policy and action to reduce GS, and, therefore, contribute to reducing mother-to-child transmission and the adverse effects on birth outcomes related to congenital syphilis. Also, it could be used by the Brazilian Ministry of Health in planning the Previne Brasil, a public policy that will transfer financial resources to municipalities based on their health indicators, such as the proportion of pregnant with at least six prenatal care visits and tested for syphilis. 41 Despite the relevance of our results, the study presented some limitations. First, the percentages of underreported cases were calculated for the entire study period, not allowing for analysis of possible improvements or setbacks in the quality of information over time. Second, the model imposes the same reporting probability for areas within the same data quality cluster. However, we found this is the best methodology to be applied, given the restricted type of prior information we have available on underreporting of GS in Brazil.

Conclusion
Brazilian microregions and states in the North and Northeast regions presented the highest values of underreporting of GS, which directly affects the knowledge of the epidemiological profile and the estimation of the social, economic, and health burdens associated with the disease. Conversely, microregions and states in the Southern region presented the lowest values of underreported cases. This scenario highlights the regional disparities in the incidence rate of GS and the quality of prenatal care in the Brazilian context, as well as emphasizes the need to improve the quality of information registered in SINAN-Syphilis, considering the importance of disease surveillance across the country. Furthermore, strategies to prevent, control, and maintain the surveillance of GS based only on the data made available in the national health information system without correcting for underreported cases can lead to misguided actions and health policies, particularly related to health finance and the distribution of human resources across the regions.
This study provides robust evidence that can be used to enhance national surveillance system and particularly specific policies for GS detection, early treatment and disease control, based in prior information and using a statistical methodology that could be applied by health manager of many countries.